Salary Based Filtering

Downloading...
From (original): https://drive.google.com/uc?id=1V2GCHGt2dkFGqVBeoUFckU4IhUgk4ocQ
From (redirected): https://drive.google.com/uc?id=1V2GCHGt2dkFGqVBeoUFckU4IhUgk4ocQ&confirm=t&uuid=2b3496a5-5163-4052-933d-66d79a2de569
To: /home/ubuntu/.ssh/ad688-employability-sp25A1-group1-6/lightcast_jobs_postings.csv
100%|██████████| 717M/717M [00:04<00:00, 167MB/s]  
(72498, 131)
['ID', 'LAST_UPDATED_DATE', 'LAST_UPDATED_TIMESTAMP', 'DUPLICATES', 'POSTED', 'EXPIRED', 'DURATION', 'SOURCE_TYPES', 'SOURCES', 'URL', 'ACTIVE_URLS', 'ACTIVE_SOURCES_INFO', 'TITLE_RAW', 'BODY', 'MODELED_EXPIRED', 'MODELED_DURATION', 'COMPANY', 'COMPANY_NAME', 'COMPANY_RAW', 'COMPANY_IS_STAFFING', 'EDUCATION_LEVELS', 'EDUCATION_LEVELS_NAME', 'MIN_EDULEVELS', 'MIN_EDULEVELS_NAME', 'MAX_EDULEVELS', 'MAX_EDULEVELS_NAME', 'EMPLOYMENT_TYPE', 'EMPLOYMENT_TYPE_NAME', 'MIN_YEARS_EXPERIENCE', 'MAX_YEARS_EXPERIENCE', 'IS_INTERNSHIP', 'SALARY', 'REMOTE_TYPE', 'REMOTE_TYPE_NAME', 'ORIGINAL_PAY_PERIOD', 'SALARY_TO', 'SALARY_FROM', 'LOCATION', 'CITY', 'CITY_NAME', 'COUNTY', 'COUNTY_NAME', 'MSA', 'MSA_NAME', 'STATE', 'STATE_NAME', 'COUNTY_OUTGOING', 'COUNTY_NAME_OUTGOING', 'COUNTY_INCOMING', 'COUNTY_NAME_INCOMING', 'MSA_OUTGOING', 'MSA_NAME_OUTGOING', 'MSA_INCOMING', 'MSA_NAME_INCOMING', 'NAICS2', 'NAICS2_NAME', 'NAICS3', 'NAICS3_NAME', 'NAICS4', 'NAICS4_NAME', 'NAICS5', 'NAICS5_NAME', 'NAICS6', 'NAICS6_NAME', 'TITLE', 'TITLE_NAME', 'TITLE_CLEAN', 'SKILLS', 'SKILLS_NAME', 'SPECIALIZED_SKILLS', 'SPECIALIZED_SKILLS_NAME', 'CERTIFICATIONS', 'CERTIFICATIONS_NAME', 'COMMON_SKILLS', 'COMMON_SKILLS_NAME', 'SOFTWARE_SKILLS', 'SOFTWARE_SKILLS_NAME', 'ONET', 'ONET_NAME', 'ONET_2019', 'ONET_2019_NAME', 'CIP6', 'CIP6_NAME', 'CIP4', 'CIP4_NAME', 'CIP2', 'CIP2_NAME', 'SOC_2021_2', 'SOC_2021_2_NAME', 'SOC_2021_3', 'SOC_2021_3_NAME', 'SOC_2021_4', 'SOC_2021_4_NAME', 'SOC_2021_5', 'SOC_2021_5_NAME', 'LOT_CAREER_AREA', 'LOT_CAREER_AREA_NAME', 'LOT_OCCUPATION', 'LOT_OCCUPATION_NAME', 'LOT_SPECIALIZED_OCCUPATION', 'LOT_SPECIALIZED_OCCUPATION_NAME', 'LOT_OCCUPATION_GROUP', 'LOT_OCCUPATION_GROUP_NAME', 'LOT_V6_SPECIALIZED_OCCUPATION', 'LOT_V6_SPECIALIZED_OCCUPATION_NAME', 'LOT_V6_OCCUPATION', 'LOT_V6_OCCUPATION_NAME', 'LOT_V6_OCCUPATION_GROUP', 'LOT_V6_OCCUPATION_GROUP_NAME', 'LOT_V6_CAREER_AREA', 'LOT_V6_CAREER_AREA_NAME', 'SOC_2', 'SOC_2_NAME', 'SOC_3', 'SOC_3_NAME', 'SOC_4', 'SOC_4_NAME', 'SOC_5', 'SOC_5_NAME', 'LIGHTCAST_SECTORS', 'LIGHTCAST_SECTORS_NAME', 'NAICS_2022_2', 'NAICS_2022_2_NAME', 'NAICS_2022_3', 'NAICS_2022_3_NAME', 'NAICS_2022_4', 'NAICS_2022_4_NAME', 'NAICS_2022_5', 'NAICS_2022_5_NAME', 'NAICS_2022_6', 'NAICS_2022_6_NAME']
shape: (3, 131)
ID LAST_UPDATED_DATE LAST_UPDATED_TIMESTAMP DUPLICATES POSTED EXPIRED DURATION SOURCE_TYPES SOURCES URL ACTIVE_URLS ACTIVE_SOURCES_INFO TITLE_RAW BODY MODELED_EXPIRED MODELED_DURATION COMPANY COMPANY_NAME COMPANY_RAW COMPANY_IS_STAFFING EDUCATION_LEVELS EDUCATION_LEVELS_NAME MIN_EDULEVELS MIN_EDULEVELS_NAME MAX_EDULEVELS MAX_EDULEVELS_NAME EMPLOYMENT_TYPE EMPLOYMENT_TYPE_NAME MIN_YEARS_EXPERIENCE MAX_YEARS_EXPERIENCE IS_INTERNSHIP SALARY REMOTE_TYPE REMOTE_TYPE_NAME ORIGINAL_PAY_PERIOD SALARY_TO SALARY_FROM SOC_2021_5_NAME LOT_CAREER_AREA LOT_CAREER_AREA_NAME LOT_OCCUPATION LOT_OCCUPATION_NAME LOT_SPECIALIZED_OCCUPATION LOT_SPECIALIZED_OCCUPATION_NAME LOT_OCCUPATION_GROUP LOT_OCCUPATION_GROUP_NAME LOT_V6_SPECIALIZED_OCCUPATION LOT_V6_SPECIALIZED_OCCUPATION_NAME LOT_V6_OCCUPATION LOT_V6_OCCUPATION_NAME LOT_V6_OCCUPATION_GROUP LOT_V6_OCCUPATION_GROUP_NAME LOT_V6_CAREER_AREA LOT_V6_CAREER_AREA_NAME SOC_2 SOC_2_NAME SOC_3 SOC_3_NAME SOC_4 SOC_4_NAME SOC_5 SOC_5_NAME LIGHTCAST_SECTORS LIGHTCAST_SECTORS_NAME NAICS_2022_2 NAICS_2022_2_NAME NAICS_2022_3 NAICS_2022_3_NAME NAICS_2022_4 NAICS_2022_4_NAME NAICS_2022_5 NAICS_2022_5_NAME NAICS_2022_6 NAICS_2022_6_NAME
str str str i64 str str i64 str str str str str str str str i64 i64 str str bool str str i64 str i64 str i64 str i64 i64 bool i64 i64 str str i64 i64 str i64 str i64 str i64 str i64 str i64 str i64 str i64 str i64 str str str str str str str str str str str i64 str i64 str i64 str i64 str i64 str
"1f57d95acf4dc67ed2819eb12f049f… "9/6/2024" "2024-09-06 20:32:57.352 Z" 0 "6/2/2024" "6/8/2024" 6 "[   "Company" ]" "[   "brassring.com" ]" "[   "https://sjobs.brassring.c… "[]" null "Enterprise Analyst (II-III)" "31-May-2024 Enterprise Analys… "6/8/2024" 6 894731 "Murphy USA" "Murphy USA" false "[   2 ]" "[   "Bachelor's degree" ]" 2 "Bachelor's degree" null null 1 "Full-time (> 32 hours)" 2 2 false null 0 "[None]" null null null "Data Scientists" 23 "Information Technology and Com… 231010 "Business Intelligence Analyst" 23101011 "General ERP Analyst / Consulta… 2310 "Business Intelligence" 23101011 "General ERP Analyst / Consulta… 231010 "Business Intelligence Analyst" 2310 "Business Intelligence" 23 "Information Technology and Com… "15-0000" "Computer and Mathematical Occu… "15-2000" "Mathematical Science Occupatio… "15-2050" "Data Scientists" "15-2051" "Data Scientists" "[   7 ]" "[   "Artificial Intelligence" … 44 "Retail Trade" 441 "Motor Vehicle and Parts Dealer… 4413 "Automotive Parts, Accessories,… 44133 "Automotive Parts and Accessori… 441330 "Automotive Parts and Accessori…
"0cb072af26757b6c4ea9464472a50a… "8/2/2024" "2024-08-02 17:08:58.838 Z" 0 "6/2/2024" "8/1/2024" null "[   "Job Board" ]" "[   "maine.gov" ]" "[   "https://joblink.maine.gov… "[]" null "Oracle Consultant - Reports (3… "Oracle Consultant - Reports (3… "8/1/2024" null 133098 "Smx Corporation Limited" "SMX" true "[   99 ]" "[   "No Education Listed" ]" 99 "No Education Listed" null null 1 "Full-time (> 32 hours)" 3 3 false null 1 "Remote" null null null "Data Scientists" 23 "Information Technology and Com… 231010 "Business Intelligence Analyst" 23101012 "Oracle Consultant / Analyst" 2310 "Business Intelligence" 23101012 "Oracle Consultant / Analyst" 231010 "Business Intelligence Analyst" 2310 "Business Intelligence" 23 "Information Technology and Com… "15-0000" "Computer and Mathematical Occu… "15-2000" "Mathematical Science Occupatio… "15-2050" "Data Scientists" "15-2051" "Data Scientists" null null 56 "Administrative and Support and… 561 "Administrative and Support Ser… 5613 "Employment Services" 56132 "Temporary Help Services" 561320 "Temporary Help Services"
"85318b12b3331fa490d32ad014379d… "9/6/2024" "2024-09-06 20:32:57.352 Z" 1 "6/2/2024" "7/7/2024" 35 "[   "Job Board" ]" "[   "dejobs.org" ]" "[   "https://dejobs.org/dallas… "[]" null "Data Analyst" "Taking care of people is at th… "6/10/2024" 8 39063746 "Sedgwick" "Sedgwick" false "[   2 ]" "[   "Bachelor's degree" ]" 2 "Bachelor's degree" null null 1 "Full-time (> 32 hours)" 5 null false null 0 "[None]" null null null "Data Scientists" 23 "Information Technology and Com… 231113 "Data / Data Mining Analyst" 23111310 "Data Analyst" 2311 "Data Analysis and Mathematics" 23111310 "Data Analyst" 231113 "Data / Data Mining Analyst" 2311 "Data Analysis and Mathematics" 23 "Information Technology and Com… "15-0000" "Computer and Mathematical Occu… "15-2000" "Mathematical Science Occupatio… "15-2050" "Data Scientists" "15-2051" "Data Scientists" null null 52 "Finance and Insurance" 524 "Insurance Carriers and Related… 5242 "Agencies, Brokerages, and Othe… 52429 "Other Insurance Related Activi… 524291 "Claims Adjusting"
(30808, 131)

To ensure data quality, we filtered out records with missing values in the SALARY column. This reduced our working dataset to 30808 rows.

shape: (1, 5)
┌───────────────┬───────────────┬────────────┬────────────┬──────────────┐
│ mean_salary   ┆ median_salary ┆ min_salary ┆ max_salary ┆ std_salary   │
│ ---           ┆ ---           ┆ ---        ┆ ---        ┆ ---          │
│ f64           ┆ f64           ┆ i64        ┆ i64        ┆ f64          │
╞═══════════════╪═══════════════╪════════════╪════════════╪══════════════╡
│ 117953.755031 ┆ 116300.0      ┆ 15860      ┆ 500000     ┆ 45133.878359 │
└───────────────┴───────────────┴────────────┴────────────┴──────────────┘

Annual Salary Distribution - Box Plot

The box plot above provides a visual summary of the annual salary distribution within the dataset after filtering for non-null entries. The central box represents the interquartile range (IQR), which captures the middle 50% of salaries in the dataset. The line within the box denotes the median salary, which is approximately $116,300.

Key observations:

  • The minimum salary (excluding outliers) is slightly above $15,000, while the maximum non-outlier salary is just below $230,000.
  • Numerous data points are plotted as individual markers above the upper whisker, identifying them as outliers. These salaries extend up to $500,000, indicating the presence of extreme high-paying roles.
  • The interquartile range (IQR) suggests that most salaries are concentrated between $80,000 and $150,000, which is a critical salary band for the job market analysis.
(28971, 131)

To ensure robust analysis and mitigate the impact of extreme values, salaries below $50,000 and above $230,000 were filtered out prior to regression modeling. This decision helps reduce skewness and prevents outliers from disproportionately influencing predictions.


Handling Missing Experience Values

MIN_YEARS_EXPERIENCE NaN's: 6263 (21.62%)
Median MIN_YEARS_EXPERIENCE: 5.0

Next, we analyzed missing values in the MIN_YEARS_EXPERIENCE column. About 21.6% of the entries were missing. These were imputed using the median value (5 years), which preserved the integrity of the dataset while addressing missingness in a statistically neutral way.

shape: (1, 1)
┌──────────────────────┐
│ MIN_YEARS_EXPERIENCE │
│ ---                  │
│ u32                  │
╞══════════════════════╡
│ 0                    │
└──────────────────────┘

Exploring the ONET Category


Unique value counts for 'ONET':
shape: (1, 2)
┌────────────┬───────┐
│ ONET       ┆ count │
│ ---        ┆ ---   │
│ str        ┆ u32   │
╞════════════╪═══════╡
│ 15-2051.01 ┆ 28971 │
└────────────┴───────┘

Unique value counts for 'ONET_NAME':
shape: (1, 2)
┌────────────────────────────────┬───────┐
│ ONET_NAME                      ┆ count │
│ ---                            ┆ ---   │
│ str                            ┆ u32   │
╞════════════════════════════════╪═══════╡
│ Business Intelligence Analysts ┆ 28971 │
└────────────────────────────────┴───────┘

Unique value counts for 'ONET_2019':
shape: (1, 2)
┌────────────┬───────┐
│ ONET_2019  ┆ count │
│ ---        ┆ ---   │
│ str        ┆ u32   │
╞════════════╪═══════╡
│ 15-2051.01 ┆ 28971 │
└────────────┴───────┘

Unique value counts for 'ONET_2019_NAME':
shape: (1, 2)
┌────────────────────────────────┬───────┐
│ ONET_2019_NAME                 ┆ count │
│ ---                            ┆ ---   │
│ str                            ┆ u32   │
╞════════════════════════════════╪═══════╡
│ Business Intelligence Analysts ┆ 28971 │
└────────────────────────────────┴───────┘

We focused our analysis on a specific ONET category: Business Intelligence Analysts. The filtered dataset had approximately 28,971 entries under this job classification. Unique counts for ONET, ONET_NAME, and ONET_2019_NAME confirmed the consistency of this subset.


# Analyzing Job Titles

shape: (10_107, 1)
┌─────────────────────────────────┐
│ TITLE_CLEAN                     │
│ ---                             │
│ str                             │
╞═════════════════════════════════╡
│ analyst data day                │
│ sap functional f s hana implem… │
│ data modeler michigan           │
│ data analyst sql powerbi        │
│ merch planner rtw womens plus … │
│ …                               │
│ signals data analyst mid        │
│ data quality sr analyst         │
│ bcba telehealth                 │
│ regulatory capital reporting a… │
│ senior analyst advanced analyt… │
└─────────────────────────────────┘
shape: (10, 2)
┌─────────────────────────────────┬──────┐
│ TITLE_CLEAN                     ┆ len  │
│ ---                             ┆ ---  │
│ str                             ┆ u32  │
╞═════════════════════════════════╪══════╡
│ data analyst                    ┆ 2077 │
│ business intelligence analyst   ┆ 330  │
│ senior data analyst             ┆ 307  │
│ enterprise architect            ┆ 294  │
│ oracle hcm cloud implementatio… ┆ 134  │
│ data and reporting professiona… ┆ 120  │
│ lead data analyst               ┆ 113  │
│ solution architect              ┆ 108  │
│ sr data analyst                 ┆ 105  │
│ data analytics engineer         ┆ 98   │
└─────────────────────────────────┴──────┘
shape: (10, 1)
┌─────────────────────────────────┐
│ BODY                            │
│ ---                             │
│ str                             │
╞═════════════════════════════════╡
│ Comisiones de $1000 - $3000 po… │
│ About Lumen                     │
│                                 │
│ Lumen connects th…              │
│ Sr. Marketing Analyst           │
│ United S…                       │
│ Data Analyst In Ridgecrest At … │
│ Power, Utilities & Renewables … │
│ Sr. Enterprise Data Architectu… │
│ Job Description: We Are: Accen… │
│ Principal growth data analyst … │
│ About Lumen                     │
│                                 │
│ Lumen connects th…              │
│ Senior Enterprise Architect (r… │
└─────────────────────────────────┘

Bar Plot: Top 10 Business Intelligence Analyst Role Titles

This bar chart illustrates the frequency distribution of the top 10 most common job titles under the ONET category “Business Intelligence Analysts.” The data was extracted from job postings where ONET classification was available and grouped by the TITLE_CLEAN field.

The most frequently occurring role by a significant margin is “Data Analyst” with over 2,000 occurrences, followed by “Business Intelligence Analyst,” “Senior Data Analyst,” and “Enterprise Architect.” This suggests that the term “Data Analyst” is more widely used in job postings, even when referring to more specialized roles such as business intelligence positions.

The diversity of titles such as “Oracle HCM Cloud Implementation Lead,” “Lead Data Analyst,” and “Solution Architect” also reflects the multidisciplinary nature of business intelligence roles, spanning technical infrastructure, enterprise systems, and analytics.

This visualization supports the idea that while many job functions may fall under the business intelligence umbrella, employers label them in varied ways, highlighting the importance for job seekers to use a broad range of keywords when searching for roles in this domain.

Data Cleaning and Refinement of Education-Level Information


#| eval: true
#| echo: false

Jobs_lgt_filter.select([
    pl.col("MIN_EDULEVELS"),
    pl.col("MIN_EDULEVELS_NAME")
]).unique().sort("MIN_EDULEVELS")
shape: (6, 2)
MIN_EDULEVELS MIN_EDULEVELS_NAME
i64 str
0 "High school or GED"
1 "Associate degree"
2 "Bachelor's degree"
3 "Master's degree"
4 "Ph.D. or professional degree"
99 "No Education Listed"
shape: (1, 2)
Nulls_MIN_EDULEVELS Nulls_MIN_EDULEVELS_NAME
u32 u32
0 0
shape: (6, 2)
MIN_EDULEVELS_NAME len
str u32
"Bachelor's degree" 18758
"No Education Listed" 6255
"Associate degree" 1752
"High school or GED" 1351
"Master's degree" 820
"Ph.D. or professional degree" 35
(22716, 131)
  • To assess the role of education in job salary trends, we began by exploring the MIN_EDULEVELS and MIN_EDULEVELS_NAME columns. These columns represent the minimum educational qualifications required for each job posting. The dataset includes a range of educational levels, from “High school or GED” to “Ph.D. or professional degree”, along with a placeholder value 99 indicating “No Education Listed”.

  • A quick check confirmed that there were no missing (NaN) values in either column, which eliminated the need for imputation or data repair.

  • We then examined the frequency distribution of different education levels. It was observed that “Bachelor’s degree” was the most commonly required qualification, followed by “No Education Listed” and “Associate degree”.

  • To ensure analytical clarity and avoid skewing insights with ambiguous values, we removed rows where the education level was listed as “No Education Listed” (coded as 99). This helped refine the dataset to include only those records where education expectations were clearly stated, ensuring better model training and more interpretable results in subsequent steps.


Random Forest Model

rf = RandomForestRegressor(n_estimators=100, random_state=999)
rf.fit(X_train, y_train)
RandomForestRegressor(random_state=999)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Feature Selection and Model Training

To predict salary outcomes, we selected a set of relevant features including MIN_YEARS_EXPERIENCE, MIN_EDULEVELS, and the top 10 job titles obtained from the cleaned TITLE_CLEAN column through one-hot encoding. The dataset was split into training and testing sets with a 70/30 ratio using a fixed random state to ensure reproducibility. We implemented a Random Forest Regressor with 100 estimators.

Feature Importance Visualization

A bar chart was generated to visualize the importance of each feature in predicting salary. MIN_YEARS_EXPERIENCE emerged as the most influential feature, followed by titles such as enterprise architect, data analyst, and oracle hcm cloud implementation lead core hr module. This suggests that experience and specific job roles have a strong influence on salary prediction, while education level (MIN_EDULEVELS) had moderately less impact in the model.

=== Random Forest Regression Performance ===
R² Score:               0.5959
Mean Absolute Error:    15,409.01
Mean Squared Error:     513,392,249.80
Root Mean Squared Error:22,658.16

Model Performance Evaluation

The performance of the Random Forest Regression model was evaluated using the following metrics:

  • R² Score: 0.5959

  • Mean Absolute Error (MAE): 15,409.01

  • Mean Squared Error (MSE): 513,392,249.80

  • Root Mean Squared Error (RMSE): 22,658.16

These results indicate that while the model captures general salary trends based on experience and job role, there is room for improvement, potentially through feature engineering, additional variables, or advanced tuning techniques.

Interpretation

  • A moderate R² score indicates that about 60% of the variance in salaries can be explained by the model.

  • The residual error values suggest variability in salary predictions, potentially due to unobserved variables such as company reputation, geographic location, or job-specific skills.


Residual Plot for the Random Forest Model

The residual plot above visualizes the difference between the actual salaries and the predicted salaries generated by the Random Forest Regressor. On the Y-axis, we have the residuals, which are calculated as:

Residual = Actual Salary - Predicted Salary

On the X-axis, we have the actual salary values. The horizontal red dashed line at Y=0 represents the ideal line where residuals would lie if predictions were perfect.

Observations:

  • Most residuals are centered around the zero line, which indicates that the model’s predictions are generally close to the actual values.
  • However, there is a noticeable spread of residuals at both the lower and higher ends of the salary range. This suggests heteroscedasticity, meaning the variance of the residuals increases with the actual salary.
  • Some outliers are visible, where the predicted salary significantly underestimates or overestimates the actual salary by large margins.
  • The plot maintains a relatively linear pattern without any distinct curvature, which is desirable as it supports the assumption that the model captures the relationship without strong systematic bias.

Conclusion:

While the model performs reasonably well for most observations, the presence of outliers and increasing residual spread at higher salary ranges indicate that the model may benefit from additional feature engineering or alternative modeling techniques to improve accuracy across the entire range.


Salary Categorization and Confusion Matrix Interpretation

To enhance interpretability of salary prediction outputs, we categorized the continuous salary values into three distinct bins: - Low: Salaries less than $60,000 - Medium: Salaries between $60,000 and $120,000 - High: Salaries greater than $120,000

These bins were applied to both the actual test set and the model’s predicted salaries. A confusion matrix was then constructed to assess the model’s classification performance across these categories.

Confusion Matrix Summary:

  • Low-End Salaries: None were correctly classified. All 40 instances were incorrectly predicted as belonging to the “Medium” category.
  • Average (Medium) Salaries: Out of 612 actual medium salary records, 587 were correctly classified, and 25 were misclassified as “High”.
  • High Earners: Among 230 actual high-salary records, 155 were correctly classified, while 75 were misclassified as “Medium”.

Interpretation:

  • The model demonstrates strong predictive performance for the medium salary category, which could be attributed to its higher representation in the training data.
  • However, it struggles to identify low-end salary jobs, consistently misclassifying them as medium. This could be due to a class imbalance or model bias toward the center of the salary range.
  • High salaries are partially misclassified, likely due to overlap in feature values with medium-salary roles.

This analysis highlights the need for further feature engineering or possibly applying class balancing techniques to improve classification performance across all salary categories.


Salary Predictions Using Random Forest Model

The Random Forest Regressor model was utilized to predict salaries for four hypothetical candidates based on their experience, education level, and job title. Below is a summary and interpretation of the predicted results:

Predicted Salary 01: $74,467.49

Scenario 1:

  • Entry-level Data Analyst (Bachelor’s Degree, 1 Year of Experience)
  • Predicted Salary: $74,467.49

This prediction reflects an early-career role, showing a relatively lower salary aligned with market expectations for individuals with minimal experience and a bachelor’s degree.

Predicted Salary 02: $123,480.98

Scenario 2:

  • Entry-level Data Analyst (Master’s Degree, 1 Year of Experience)
  • Predicted Salary: $123,480.98

With the same experience level as Scenario 1 but a higher education qualification, the model predicts a substantially higher salary. This illustrates how education level, particularly a master’s degree, significantly boosts compensation potential for data roles.

Predicted Salary 03: $149,786.45

Scenario 3:

  • Senior Data Analyst (7 Years of Experience, Master’s Degree)
  • Predicted Salary: $149,786.45

The increase in experience leads to a proportionate increase in predicted salary. This aligns with industry trends where senior analysts with advanced degrees command higher pay due to their skill maturity and potential leadership responsibilities.

Predicted Salary 04: $162,616.71

Scenario 4:

  • Enterprise Architect (7 Years of Experience, Master’s Degree)
  • Predicted Salary: $162,616.71

Among all scenarios, this role has the highest salary estimate. The title “Enterprise Architect” along with senior-level experience and advanced education demonstrates the highest value according to the model, consistent with real-world compensation structures for such strategic positions.

Conclusion:

These predictions showcase how variables such as years of experience, minimum education level, and job title contribute to salary estimations. The model effectively differentiates salary bands across role types and qualifications, providing valuable insights for career planning and market benchmarking.